Topological Data Analysis is a novel approach, useful whenever data can be described by topological structures such as graphs. The aim of this paper is to investigate whether such tool can be used in order to define a set of descriptors useful for pattern recognition and machine learning tasks. Specifically, we consider a supervised learning problem with the final goal of predicting proteins' physiological function starting from their respective residue contact network. Indeed, folded proteins can effectively be described by graphs, making them a useful case-study for assessing Topological Data Analysis effectiveness concerning pattern recognition tasks. Experiments conducted on a subset of the Escherichia coli proteome using two different classification systems show that descriptors derived from Topological Data Analysis - namely, the Betti numbers sequence - lead to classification performances comparable with descriptors derived from widely-known centrality measures, as concerns the protein function prediction problem. Further benchmarking tests suggest the presence of some information despite the heavy compression intrinsic to the protein-to-Betti numbers casting.

Supervised approaches for protein function prediction by topological data analysis / Martino, Alessio; Rizzi, Antonello; Mascioli, Fabio Massimo Frattale. - 2018:(2018), pp. 1-8. (Intervento presentato al convegno International Joint Conference on Neural Networks (IJCNN) 2018 tenutosi a Rio de Janeiro; Brazil) [10.1109/IJCNN.2018.8489307].

Supervised approaches for protein function prediction by topological data analysis

Martino, Alessio;Rizzi, Antonello;Mascioli, Fabio Massimo Frattale
2018

Abstract

Topological Data Analysis is a novel approach, useful whenever data can be described by topological structures such as graphs. The aim of this paper is to investigate whether such tool can be used in order to define a set of descriptors useful for pattern recognition and machine learning tasks. Specifically, we consider a supervised learning problem with the final goal of predicting proteins' physiological function starting from their respective residue contact network. Indeed, folded proteins can effectively be described by graphs, making them a useful case-study for assessing Topological Data Analysis effectiveness concerning pattern recognition tasks. Experiments conducted on a subset of the Escherichia coli proteome using two different classification systems show that descriptors derived from Topological Data Analysis - namely, the Betti numbers sequence - lead to classification performances comparable with descriptors derived from widely-known centrality measures, as concerns the protein function prediction problem. Further benchmarking tests suggest the presence of some information despite the heavy compression intrinsic to the protein-to-Betti numbers casting.
2018
International Joint Conference on Neural Networks (IJCNN) 2018
protein function prediction; topological data analysis; betti numbers; support vector machines
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Supervised approaches for protein function prediction by topological data analysis / Martino, Alessio; Rizzi, Antonello; Mascioli, Fabio Massimo Frattale. - 2018:(2018), pp. 1-8. (Intervento presentato al convegno International Joint Conference on Neural Networks (IJCNN) 2018 tenutosi a Rio de Janeiro; Brazil) [10.1109/IJCNN.2018.8489307].
File allegati a questo prodotto
File Dimensione Formato  
Martino_Supervised_2018.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.46 MB
Formato Adobe PDF
1.46 MB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1202073
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 19
  • ???jsp.display-item.citation.isi??? 34
social impact